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ABSTRACT 


This  thesis  addresses  the  problem  of  small  user  groups 
teing  forced  to  use  input  data  collected  and  processed  by 
sources  outside  their  span  of  control.  Specifically,  the 
use  of  an  active  data  dictionary  to  locally  validate  such 
input  data  is  examined.  The  thesis  proceeds  from  a  general 
review  of  data  validation  technigues  and  criteria,  through 
an  examination  of  data  dictionaries,  to  an  illustration  cf 
how  an  active  data  dictionary  can  be  configured  to  act  as  a 
”data  filter"  for  input  data. 

Key  initial  planning  and  design  steps  are  set  forth, 
including  requirements  analysis,  data  definition,  and 
initial  logical  design.  A  checklist  of  questions  to  answer 
during  each  of  these  activities  is  included. 

The  concepts  discussed  in  the  paper  are  then  applied  to 
a  specific  case  (DCSPLANS  Branch,  a. S.  Army  Military 
Personnel  Center,  Alexandria,  VA)  resulting  in  a  ’’data 
filter"  structure  diagram  that  is  tailored  to  the  DCS? LANS* 
environment  and  their  unique  validation  needs.  r- 


TABLE  OP  CCNTBNTS 


INTRODUCTION  . 

A.  CONTROL  OF  DATA  . 

B.  DCSPLANS,  “ILPERCiN  . 

C.  THESIS  METHCDOLOGI  . 

INPUT  VALIDATION . 

A.  GENERAL  DESCRIPTION  . 

E.  VALIDATION  TECHNICNES  . 

1.  Category . 

2.  Transaction  Validation  . 

3.  Format  Checks  •  . 

U,  Reasonableness  Checks  . 

C.  EDIT  AND  VALIDATION  fULES  . 

DATA  DICTIONARY  A3  ’’DATA  FILTER” . 

A.  BASIC  CONCEPTS . 

1.  Data  Dictionary . 

2.  Metadata  . 

3.  "Active"  Dati  " ictionary  . 

h.  Data  Extractij.  . 

B.  CONFIGURATION  . 

1.  Metadata  Gonorition  . 

2.  Edit  and  Validation  Prograns  .  .  . 

3.  General  Design  . 

C.  ADVANTAGES  . 

PLANNING  AND  GENERAL  DESIGN  . 

A.  KEY  DEVELOPF.ENT  PHASES . 

B.  PHASE  ONE  -  SYSTEM  E  AVIRON.'IENT/GEN ER  A L 

CHARACTERISTICS  . 


1 

2 


Descr  iptiori 
Ciieck.  list 


V. 


VI. 


C.  THASE  TWO  -  DATA  D"F IMITIO:;/'/ All D  ATI C 'I 

CRITERIA  . 

1.  Description  ..  . 

2.  Checklist  . 

D.  PHASE  THREE  -  IMITIAI  LOGICAL  DESIGN  . 

1.  Description . 

2.  Checklist  . . 

1.  Follow-on  Design . 

THE  jCSPLANS  "CA'^A  FILTFE"  3YSTE’<  .  .  .  . 
A.  CCSPLANS  SYSTEM  GOAL  AND  ODJZCTIVES 
5.  "DATA  FILTER"  STRUCTLRE . 

1.  Structure  Diagram . 

2.  Narrative  Descriptions . 

C.  "DATA  FILTER"  laPLLJI  IKTA  T  ION . 

CCNCLUSICNS  ANC  EECOM.MEN  lATIONS . 

A.  COMCLOSIONS  . 

B.  RECOMMEND  Allows . 


LIST  CF  REFERENCES 


.  .  5 


INITIAL  DISTRIBUTION  IIS 


LIST  OF  FlCaRES 


1.1  F  rce  Plans  Branch . 

3.1  Passive  Dictionary  .  21 

3.2  Active  Dictionary . 22 

3.3  Data  Zictraction  Desijn . 24 

3.4  General  Data  Filter  Design . ...23 

3.3  The  Data  Filter  System  . . 23 

4.1  Structure  Diajrars  . . 33 

5.1  Saicyle  Paragraph  Fornat . 41 


I 


I 


7 


I.  INTEODCCTION 


A.  CCSTEOL  OF  DATA 

One  crol.lem  pla'jain'j  today's  iiif oraidtLon  mdiiiger  is  tue 
serious  lack  of  control  over  data  waich  has  developed  as 
computers  and  their  applications  Lave  spread  throughout 
organ i7.atior.s.  Recently,  there  has  beer,  a  considerable 
increase  in  the  attection  being  paid  to  this  problen.. 
however,  Ecst  organizations  vhose  inf  or  oat  ion  .?  y  .o  t  e  ir.  s  were 
developed  ir  the  60' s  and  early  to  middle  70's  still  surfer 
the  ill  effects  cf  iiptoperly  ccatrolied  data.  In  the.se 
envi  r  cnr.cn  ts,  redundant,  incorcleto,  and  inaccurate  data  aro 
still  prevalent.  Under  such  circumstances,  tlie  probability 
that  faulty  lata  will  directly  contribute  to  poor 
organizational  planning  and  ineffective  deci.sioji -inak in g  is 
signif icantly  increased- 

?hile  seme  organizations  have  underta/.en  action  to 
correct  their  data  control  pro!  lens,  many  others  are 
overwhelmed  by  tl'.o  enermity,  ccaplexity,  and  co.st  of  the 
task.  In  very  large  organizations,  the  cost  and  cor^:i6.<ity 
take  cn  proportions  that  appear  extremely  prohih  i  ti  ^'e . 

Unf ortunatcly,  it  is  these  large  organizations  w’-.ich  have 
the  greatest  need  for  carefully  controlled  data.  Large 
orga  II  iza  t  io  ns  arc  alsc  more  likely  to  experience  adverse 
effects  which  extend  beyond  these  foir.  1  in  s.7.ailer 
enterprises. 

One  of  these  effects  is  .manifest  in  tae  helpless 
position  in  which  sene  orga niza tiona 1  user  groups  find 
t.he.m.s6lve.s.  As  one  cog  in  a  large  wheel,  those  groups  or  ten 


are  Screed  to  use  data  collectcc  and  processe:!  by  other 
or'3  a  niz  itior.a  1  elements  over  vhcis  they  exorcise  ao  cor.  trod. 

A  serious  Janjer  in  this  circumstance  is  the  receipt  anc 
subsequent  use  of  inaccurate  data. 

Information  systems  need  valid  data  to  bo  effective!  A 
rash  assumption  by  a  data  processing  element  that  inaccur.ute 
data  arc  correct  can  have  devastating  effects  on  a  parent 
organ iz iti cn,  especially  if  infermation  based  on  the  lata  is 
used  for  strategic  p lanning/decision-uak in g. 

Ahen  input  data  cf  unkno-fn  validity  is  being  transfered 
among  data  p-Tocessing  elements  vitiiin  an  o  rg.i.niz  i  ti  on ,  ll.c- 
problem  is  almost  alwap's  a  systemic  one  'i^ith  deep  and 
widespread  roots.  Corrective  action  3n  an  org  ar.ization-v  i  ie 
basis  often  is  neglected  because  of  exces-sive  costs.  'Iscrs 
w’.o  fir  1  themselves  in  tl.ese  situation.s  are  frequently  left 
to  their  own  devices,  ani  they  nust  ievolop  their  own  meins 
for  validdtinj  inputs.  An  illustration  or  a  user  greupu 
experiencinj  such  a  situation  is  the  Office  of  the  Ooputy 
Chief  of  Staff,  Plan.s  (DCSPL  A’i -S) ,  'J.S.  Army  dilitar'/ 
Personnel  Center  ( 'll LP F,F.C"‘!)  ,  ip.  Alexandria,  Virjiria  . 


B.  DCSFLANS,  MILPEECEN 

U.S.  Army  .'ilLPSFCEN  is  rospcnsrhle  for  tne  w  orldwidt 
distribution  a n 1  professional  development  of  aray  officer 
and  enlisted  personnel.  Pitiiin  IrL?  al'CZ'I,  DCSPLA’i.S  }.as  ti^e 
mission  of  planning,  pro  g  ra  i.ni  n  g,  an  1  executing  cur..e:it  ani 
future  force  alignment,  i.e.,  matchinj  personrei  inventor/ 
to  force  a u tbori za t  icn  levels. 

DCSFLANd  is  composed  of  five  brar.Ciies,  eaci.  of  which 
monitors  a  specified  portion  of  tiic  force  aligns  ant  mission. 


FOrC”  PLANE  3IiANC;J 
DA  PC-CP? 


IIJPUTS  TO  NODZI.S 


■TjrcTnT'Si^'/riET'EE'TTr:: - 

OFFICES  MASlEr.  FILE 
GAIN/LOSS  TAPE 

PFRSOM.N’EL  AL'AG  E  .IE  T  AUTHOE- 
IZATIOfl  CCC'JIEN'T 
SPACE  IMBALANCE  MI LIT API 

OCC'JPATICNAL  GPECIALTY  FILE 
P.FTENTION  BATEG  SEPOFT 
SELECTIVE  P  EENLISTMEN'^ 

BONUS  LEVELS 

TFAINEES.  TBANSIE.'JTS,  liOLSr/.S, 
STJDEN-S  BE  PGR  I 


MODELS 


T"?'EHT^nTmi“A'E7iTTTT~3;!5~TT3TCITTir-inEn‘] 
PFPSCNNFL  POLICY  Pl.JJECTlCN  .10  DEL 
OEFICF.E  PBCMOTION  MODEL 
MANNING  INDICATGu  MODEL 
OBJECTIVE  FCPCE  MODE.L 
OFFICE.^  ASSET  UIILIE.ATION  MO^EL 
OFFICER  STRENGTH  MANAGEMENT  MODEL 
CCRFECT.ABLE  A  CTJOP I OA IIG  NS  DATA  BA; 

MCDSl 


Figure  1. 1  Force  Plans  Branch 

Each  brancii  uses  a  series  of  ccanuterizol  mo  lels  to  ^.?rfji. 
a  variety  of  forecastinj  fiir.cticns.  .Mco  Figure  1.1  fcr  tn 
example  cf  the  models  anl  input  files  usel  by  DC3PLANS* 
branches.  Many  of  these  models  are  nuite  complex  and  draw 


inpat  data  from  both  .'1TLPE3C22!  and  n on-dlLPEHCSK  soarces. 
Some  infut  files  are  extroaely  larje,  feed  a  naiiher  of 
models,  an  i  historically,  have  teen  prone  to  error.  ‘Tone  of 
the  input  files  are  under  DC3PLJN3  control. 

The  output  of  DCSr-LAHS'  models  is  used  for  crucial  top 
level  decision  nakinj  which  will  deteraine  the  structure  and 
conten*"  of  army  forces  well  into  the  future.  As  such,  the 
DCSPLAK3  output  must  exhibit  a  very  hijh  degree  of  validity. 
Currently,  however,  DCSPLAMS  is  unable  to  verify  the 
accuracy  of  much  of  the  input  data  Leinj  used  by  its  ao  lels. 
Thus,  despite  the  correctness  of  the  models  themselves,  the 
reliatility  of  the  DC£?LAP3  prcduct  must  ho  considered 
do  Jbt  £ul. 

DCSPLAMS  officials  are  quite  concernoi  about  their 
present  inalility  to  insure  that  the  data  used  in  their 
models  are  accurate.  They  realize  the  problem  will  not  he 
solved  for  them  soon  ty  the  organization  { dlLbE?  CE.V)  ,  ana 
that  they  must  (devise  thoir  own  local  solution.  A  variety 
cf  options  are  available  to  their.  Some  are  juit?>  poor 
(c.  3'.,  maintain  the  status  guo  and  rely  or.  the  input  data 
sources  to  insure  validity)  ;  other.s  are  more  feasible,  hut 
still  contain  sorioiis  shortcor.ir.js  (e.  j.  ,  upl  i  te/con  ver  t 
every  DCSPL.Ah’5  model  to  include  its  own  valid  it  ion  ;  roccss)  . 
A  much  more  e.'^foctivc  an  1  efficient  altt-rn  it  ive  is  described 
in  this  paper,  i.e,  ,  the  use  of  an  active  data  dictiorary  as 
a  '’filter"  to  validate  input  eforo  the  data  is  proces.srd  hy 
the  variou.s-  .models. 

C.  THESIS  METHODOLOGY 

This  tliesis  will  explore  the  concept  .)  f  u.sir.  j  an  active 
data  dictionary  .is  a  local  validation  tool.  It  will  proceed 


froa  a  jenerai  review  of  data  validation,  throaijh  ar. 
examination  of  data  dictionaries  and  their  design,  to  an 
illustration  of  how  ar  active  data  dictionary  can  he 
beneficially  a^'i^lied  to  DCSPLANS  operations. 

Chapter  Two  of  the  thesis  cites  the  essential  role  of 
data  validation  as  an  integral  part  of  a  data  processing 
system.  Validation  criteria  and  techaigues  used  in  the 
"data  filter”  arc  reviewed,  and  the  general  nature  of  edit 
and  validation  rules  is  introduced. 

Chapter  Three  explcras  the  data  dictionary.  It  includes 
some  basic  definitions  and  concepts,  and  specifically 
addresses  how  an  active  lata  dictionary  is  used  to  validate 
data . 


Chapter  Four  outlines  an  approach  to  "local”  initial 
design  of  a  data  dictionary  "filter”  system.  This  chapter 
also  includes  a  recoimiended  "checklist”  of  guestions  a  user 
group  car.  as.k  to  define  its  own  data  dictionary/validation 
requirements  and  system  structure. 

Chapter  Fivo  specifically  addresses  the  DCdPLASS 
situation.  It  cites  a  proposed  goal  and  some  Xey  objective 
of  a  DCSPLAh'3  validation  system,  ana  uses  a  modified 
structure  diagram  of  a  "data  filter”  to  illustrate  the 
reconmended  approach  to  DC5PLASE’  data  validation  diieEi.,a. 


Chapter  3i/  summarizes  the  results  of  this  thesis. 


II.  INPUT  VALIDATION 


A.  GENEBAL  D3SC5IPTICN 

Iraccuratri  data  ita.'as  can  easily  find  thoir  way  into 
master  files  and  datatases,  either  through  direct  ir.j;i.it  I;y 
users  or  through  improper  processing  actions  by  application 
programs.  PejarHess  of  origin,  inaccurate  data  ire  poison 
in  any  .\EP  systen.  Information  created  from  inaccurate  !jtu 
also  tends  to  be  inaccurate,  and  decisions  based  upon  sucl. 
information  are  counterproductive  to  organizational  goals  in 
almost  every  instance.  data  is  a  valuable  resource,  and  its 
accuracy  is  crucial  to  organizational  success. 

Validation  is  that  set  of  actions  wr.ich  attempts  to 
preclude  the  existence  of  inaccurate  data  within  ar, 
iaforaation  system.  Validation  tests  can  be  iaplcmentc  1  at 
any  number  of  stages  within  the  data  processing  cycle: 
prior  tc  input,  upon  input,  during  processing,  and  after 
processing  (output  decks)  .  "Icput  val  idation" ,  as 
iaplementea  by  an  active  data  dictionary  systen  ,  occurs  at 
the  second  stage. 

Input  validation  focuses  specifically  on  data  being 
entered  into  a  systeo;.  Its  aim  is  to  detect  errors  and 
thereby  insure  the  initial  accuracy  of  the  master  file  cr 
database  being  constructe  i/upia  te  1.  [Ref.  1:p.  326]  During 
input  valid  ition,  checks  are  corducted  to  insure  that  the 
inpiit/update  operaticn  itself  ir  legal,  and  that  input  lata 
does  not  violate  prescribed  accuracy  constraints.  Creaticn 
of  a  new  file  or  the  update  of  an  existing  ou^  is  j 
processing  stage  that  lemands  extremely  careful  data 
vali  laticn,  especially  in  those  cases  vhere  input  data 


is  received  from  sources  outside  the  control  of  the 
procossiry  eleaent.  fortunately,  it  is  at  this  staje  that 
the  accuracy  of  data  can  te  checked  most  effectively 
[Bef.  2:p.  289  ].  One  adiitional  caution  which  must  bo 
mentioned  at  this  point  is  that  data  does  not  become 
inaccurate  from  entry  errors  alcno.  Data  may  be  inaccurate 
simply  because  it  is  cld !  Previously  accurate  values  may  no 
lonyer  be  correct  because  available  new  values  have  not 
superseded  older  values  due  to  reflected  updates. 

Vilidaticii  processes  also  must  check  for  these  types  of 
inaccuracies. 

B,  VALIEATION  T2CHHIC0SS 
1 .  Cate. Tory 

The  general  category  of  input  validation  technigues 
used  by  the  ’’data  filter”  being  proposed  examines  input  lata 
in  the  exact  form  in  which  it  arrives  for  processing.  The 
technigues  involve!  detect  errors  by  checking  the 
"acceptdhility ”  of  both  the  data  transactions  and  the  data 
itself.  This  checking  is  accomplished  through  a  scries  of 
programme  1  ins tr ucti cns/rules,  and  is  implemente'd  very 
effectively  by  an  active  data  dictionary  system.  Three 
basic  tcchnigucs  are  included  in  the  category:  transacticn 
validation,  format  checks,  and  reasonableness  checks.  a 
well  designed  validation  progran  includes  a  combination  of 
all  three.  [Ref.  3:p.  248] 

The  transacticn  validation  technique  is  used  to 
verify  the  logitimacy  of  transactions  wnich  input  data.  The 
format  checks  and  reasonableness  checks,  on  the  other  haul, 
are  used  to  examine  the  correctness  of  data  items 
themselves.  In  order  to  facilitate  a  clearer  picture  of  the 
•'data  filter”  design  which  will  be  presented  in  the  next  two 


chapters,  a  brief  Jescriptioa  cf  the  throe  vaiidatior 
methods  is  provided  below. 

2  •  Transaction  Val i  lat  i  or. 

Transaction  valilation  should  be  the  first  technique 
to  he  applied.  It  certifies  that  "  a  specific  transaction 
is  one  that  can  be  processed  by  the  system  and  is  beinq 
submittal  properly.”  [ Hef .  4:p.  218]  Its  focus  is  the 
verification  that  the  type  and  purpose  of  the  transacticn 
are  leqitiaata  processing  actions,  an  1  thit  the  oriqinator 
of  the  transaction  has  the  aut.hcrity  to  initiate  it. 
Transactions  determined  to  be  iraccurate  ire  rejected. 

Rfelatad  valilation  clocks  which  also  oust  be 
conductad  iarinq  this  juncture  cf  the  processinj  cycle  are 
chacks  for  sequential  dependencies  and/or  proper  timing. 

For  example,  a  Mont h ly_Ieport  transaction  may  not  be  able  to 
take  place  until  ,’1onthly_Updat€  transactions  are 
successfully  executed. 

The  role  of  transaction  validation  as  a  "first  step” 
stems  frc;a  the  potential  iaaaqe  whicli  could  be  inflicted 
upon  a  system  by  the  processinj  or  an  invalid  transacticn. 
Even  if  the  invalid  transaction  is  suosequontly  discovv^rel, 
recovery  may  prove  extremely  difficult.  An  ounce  of 
prevention,  in  tiiis  case,  is  certainly  worth  a  pound  of 
cure ! 

Cnee  transaction  validity  is  ostanlishel,  th.e  input 
data  itself  is  examined  through  a  sorios  of  format  checks 
and  r e ascnahloness  checks. 


Format  Checks 


Format  checks  compare  the  actuil  contents  of 
to  a  pre-set  series  cf  user- Jc f  ineJ  rules.  A  record 


a  field 
V  )\  0  se 


0 


contents  fail  to  confccni  to  the  prescribeJ  foraat,  either  is 
rejected  outright  cr  transferred  to  an  appropriate  error 
handlin^j  routine.  Soire  of  the  irore  cor.2iori  forniat  cl-eoks 
are: 

a)  Lenyth  Chocks:  used  to  verify  tnat  a  field  contains  a 
prescribed  miniium,  naximum,  or  fixed  aiaount  of 
characters. 

t)  Character  Type  Checks:  used  to  verify  that  a  field 
contains  only  specifically  authorized  valuo  types, 
i.e. ,  nunerics  only,  alphabetics  only,  blanks,  or 
special  ciiaractcrs. 

c)  Character  Pattern  Checks:  used  to  verify  that  the 
contents  of  a  field  match  a  prescriLed  pattern  of 
alphabetics,  nurerics,  dashes,  etc. 

d)  Date  Checks;  used  to  insure  that  the  contents  of  a 
date  field  are  entered  in  the  required,  standard 
format,  i.e.,  YY.dMDD  or  YYDDD. 

^ •  Reasonableness  Checks 

Reasonableness  chocks  test  data  items  to  insure  that 
data  values  fall  within  tiic  limits  of  establislie! 
constraints.  These  constraints  are  separ.ited  into  three 
Lasic  types.  Field  constraints  limit  the  va  1  ie  of  a  .jiver 
data  item.  Intrarecord  constraints  limit  valuer,  between 
fields  in  the  same  record.  Interrocord  constraints  limit 
vilues  between  fields  in  different  records.  [Ref-  5:  p. 

179  ]  Reasonableness  checks  based  upon  field  constraints  are 
fairly  stra  i  jh  t  f  orwa  rd  in  iesi.jr  and  application. 

Iritrarecor]  and  interrecord  constraint  ciiocks,  however,  leal 
with  logical  accuracy  ar.d  the  i  rterrelationships  amotip  data 
items.  As  svich,  they  are  much  aore  difficult  to  doveioi  aai 
minaje.  Common  reasenabieness  checks  are: 

a)  Field  Constraints 


s  s 


-  ■Ran  je  Checks  -  used  to  verify  that  the  field  value 
falls  within  a  specified  ran^e,  i.c.,  the  value  does 
not  violate  an  uiiuer  or  lower  Unit. 

-  Sc'^uence  Checks  -  usbvI  to  test  a  specially  created 
field  to  insure  records  are  processed  in  the  proper 
order.  These  checks  are  also  used  to  verify  the 
presence  of  all  required  records. 

-  Coapletenoss  Checks  -  used  to  confirm  that  each 
mandatory  field  in  a  record  is  filled  with  a  data 
item  of  some  prescribed  size. 

-  Date  Chocks  -  used  to  verify  that  the  contents  of  a 
cate  field  do  not  violate  earliest  or  latest 
acceptable  date  rest  ric  ti ons. 

-  Code  Checks  -  used  to  verify  that  the  contents  cf  a 
code  field  are  contained  within  a  listing  of  valid 
and  current  codes. 

b)  Intrarocor!  and  interrecerd  Constraints: 

-  Completeness  Checks  -  used. to  identify  those  fields 
in  a  record  which  must  to  filled  based  upon  the 
contents  of  other  fields  in  that  record 

(in^ra  record )  or  other  tecorls  {in  terrecord)  . 

-  Consistency  Checks  -  used  to  verify  that  the  values 
in  certain  ciells  are  valid  in  relation  to  the  lata 
values  of  otter  fields  (either  in  the  same  recetd  or 
other  records). 

An  ^'.iar.pie  of  an  intrarccorl  c  )Bpiet>*iiess  ci.cck  is, 
"if  the  Conversion  Indicjtor  fitll  i::  a  recorl  is  filled, 
then  the  Conversion  Cede  fiell  in  that  record  must  also  be 
filled."  An  inter  record  version  of  a  c  ompletenoss  c'.ieck  rs 
as  follows:  "if  the  VH3  Multiplier  field  is  filled  for  any 
record  in  this  run,  then  all  VPF  ku  1 1 ip  1  i.e r  fields  oust  oe 
filled.  •' 

An  example  of  an  intrirecor-l  oonsrstei.cy  check  is, 
"If  the  POs  T'jde  in  a  recori  is  old,  tiier  the  jrade  value  in 
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the  record  nust  be  either  or  r:h.  '• 
consistcrcy  check  is  "no  field  vdlie  cay  he  the  saire  as 

the  SSM  field  value  cf  another  record." 

It  is  also  "ossiole  to  lave  "interfile" 
dependencies,  e. g. ,  a  record  with  an  aSS  fiell  value  of 
"9999939"  in  file  "A"  must  have  the  same  dCS  field  value  as 
a  record  in  file  "3"  which  has  an  iicntical  33N  field  value 
of  "9999999." 


C.  EDIT  AHD  VALIDATICN  BOLES 

There  must  be  an  organized  and  consistent  method  for 
applyinj  the  validation  checks  cited  above  to  data  being 
input  into  an  information  system.  The  vehicle  for  tiiis 
application  is  the  edit  and  valilatioa  rule  ("Vh)  .  EVRs  ar : 
explicit  statements  of  constraints  about  the  lata  in  i 
system.  These  rules  monitor  the  basic  structure  and 
relationships  of  data  item's,  and  enforce  processing 
restrictions  established  by  the  infornation  ruanager. 

[Hef.  6:  p.  146] 

Two  key  issues  ccncernir. j  HVhs  must  be  ailressed  when 
building  a  lata  validation  system.  ?h<i  first  is  how  to 
properly  dGv*r;lop  consistent  rules.  Consistent  rules  t.rcircte 
accurate  data,  whereas  contradictory  rules  produce  an 
unreliable  lata  system  that  eventually  will  crash. 

(Definition  and  development  of  EVl.s  will  be  caverol  in 
chapter  four  as  an  integral  part  of  tne  overall  "lata 
filter"  design  process)  . 

The  second  key  issue  is  where  to  place  an  ZV  d  mod  ale, 
(i.c.,  is  it  better  to  em’ued  it  as  part  of  an  applicaticr. 
program,  or  is  it  better  to  mike  it  a  separata  valilaticn 
program?).  The  use  of  an  active  dat  a  diet  Lon. iry  as  a  "data 
filter"  argues  for  the  latter  approach.  The  rationale  for 
snc:i  A  PLACZ.dENT  13  SET  ZOaTH  THE  HEX?  CHAPTER. 
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III.  DATA  DICTION AEI  AS  "DATA  FILTEB" 


A.  BASIC  CONCEPTS 

Four  tasic  concepts  are  central  to  a  clear  an !orsta n  iin j 
of  Low  a  (lata  dictionary  can  be  used  locally  to  validate 
data  maintaiiied  and  provided  Ly  other  soarces.  These  are; 
Data  Dictionary,  'Metadata,  ”Active'’  Data  Dicrior.  ar  y,  anl 
Data  Extraction. 

1 .  Data  Die tionary 

A  data  dictionary  is  a  centralized  reyositcry  cf  all 
definitive  inf  or;na  ti  cr.  about  the  relevant  lata  in  an 
enterprise.  T!ie  data  dictionary  provides  the  user  a 
description  of  what  data  exists,  what  it  looks  like,  an! 
what  it  scans.  [Hef.  7;p.  1]  A  data  dictionary  can  be  as 
simple  as  a  manual  cataloj  system  or  as  coniplex  as  an 
automated  set  of  nrojraas  which  controls  a  wide  ranje  of  the 
eatorprise's  lata  processing  operations, 

2 .  dc  tadata 

The  real  world  of  an  entorprj.se  cantiins  a  na.ther  of 
data  or  jects  (entities)  whic.h  arc  represente  i  in  the 
enterprise's  information  system  as  data  eloner.ts,  rocarls 
and  files.  For  example,  customers  (entity)  arc  reprc.sentcd 
by  a  set  of  data  e le men ts/fi elds  (C'J.3r_rD,  CddT_hn'^",  etc.) 
which  ccEiriso  records  (cnST_FEC)  ,  which,  in  turn,  are 
grouped  into  files  {CD3T_fILZ)  .  The  data  usel  ta  define  and 
describe  these  entities  are  called  aetilata,  i.e.,  data 
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about  tlie  data.  Metadata  are  ^stored  ir*  t;.e  data  dictxor.ary, 
forining  a  :!ictadata  database  or  letadat abai>e.  [Bef.  8:p.  9] 
Dictionary  r-etadata  ccntain  the  cba racter  ist  ics  of  each  data 
object.  The  aietadata  answer  the  following  guostions: 

a)  iihat  data  is  availalle  in  the  enterprise? 

b)  what  does  the  data  aean? 

G)  How  is  the  data  structured? 

d)  Hhat  constraints  and  relationships  exist?  Typically, 
dictionary  metadata  include:  ctject  name,  short  naoe, 
synonyi  or  aliases,  source,  narrative  description, 
recor d s/f iles  that  use  cr  contain  the  lata  object,  data 
structure/forma  t,  integrity  constraints  (a.g.,  value  range), 
and  r elationships/dopendencies.  [Ref.  9:p.  18]  Metadata  irc 
essential  ingredients  in  the  validation  of  data  l;y  a  data 
dictionary  systom. 

d.  "Active'*  Data  Dictionary 

There  are  two  basic  codes  in  which  a  lata  lictionacy 
can  function:  passive  or  active.  A  passive  data  dictionary 
merely  registers  the  metadata  and  provides  the  user  a 
facility  for  interactive  ^aery  and/or  report  ganeraticn.  It 
does  net  require  that  lata  processing  operations  depend  upon 
it  for  metadata,  and  no  direct  link  is  maintained  between 
the  passive  lata  dictionary  and  other  system  components. 

(See  Figure  1.1)  In  fact,  application  programs  and 
processes  may  obtain  their  metacata  entirely  from  other 
sources. 

An  active  data  dictionary,  on  the  otner  iiar.d, 
exercises  a  groat  deal  or  coR^rcl  over  processing  and 
metadata  us.ige  within  an  information  system.  A  lata 
dictionary  is  said  to  be  active  with  respect  to  an 
information  system,  if,  and  only  if,  tiiat  system  is 
dependent  upon  the  data  dictionary  for  its  metadata.  (See 
Figure  3.2) 
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Figure  3.1  Passive  Dictionary 

A  lictionary  is  active  to  a  i‘^'-;sar  lojr-'..'  c,i  Ly 

some  of  the  system's  programs  ar!  nrocoss  is  ar^.*  lepenU-m*- 
upon  it  for  metadata.  The  more  i>rograr.is  or  pracesses  that 
rely  on  the  lictionary,  the  more  active  it  is  said  *-c  he. 
[Hef.  10:p.  22]  The  value  of  an  active  data  diet  ion  iry  st  :ns 
from  the  establishment  of  mandatory  interfaces  hjtwocn  it 
and  various  system  prccesses.  Vhcii  rhi‘  lata  lictionary  is 
used  as  a  "lata  filter",  these  manlator/  interfaces  will 
insure  that  input  data  conform  to  pre-d efi ned  rules  and 
Stan lar  is. 
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^  =  'Jser  Irtfcrface 
*  =  Software  Interface 

?iCTE:  The  "processed  lata"  llocX  shown  above  includes 
metadata  and  all  i-rojracs  used  by  the  data  dictionary 
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Figure  3-2  Active  Dictionary 


^  T  X  tract  icn 

Data  extraction  is  a  techr.i^iie  whereby  i  sui'set  of 
data  from  a  very  lar';5€  file  system  or  vl.itaLase  is  trar.sforel 
to  a  much  smaller  "extracted"  file  or  iatanase.  "'h.e  data 
oxtracticr.  process  can  oe  either  gaito  sir  pie  or  very 
complex.  A  complex  lata  extraction  process  is  desijne]  to 
collect,  format,  ana  intejrato  cata  from  i  numbe-r  of  source 
filer./d  ltd  bases  into  a  single  1  ita  source  whose  contents  are 
specifically  tailored  to  the  needs  of  a  single  user  or  group 
of  users.  Fuch  a  system  involves  extensive  lit \ 


Jyscrir.  ♦-.ion ,  subsettirij,  a  j-jr  j  jati  ,  aril  pr^ser.  taticr. 
operatiGns,  [Ref.  11:p.  245  ]  This  tr.esis  al.lrenses  Jata 
extracticr  rraG  a  GGch  .^inpler  perspoctivc,  i.o.,  as  a  rroir* 
to  lirtit  sLz^i  of  t/.e  iat.i  tc  t:e  vaiiaate;  by  the  ia^a 

ilictionary.  In  nost  cases,  user  applications  lo  not  neei 
all  data  ccntair.ed  in  a  latje  lata  so’irce.  rh.as,  th'^ 
extracticn  of  only  pertinent  data  (a  nuch  SEallor  soLsct), 
usually  servos  to  increase  tue  spooi  o£  application  pro-jraT. 
actinij  upon  the  data.  Such  data  extraction  operations  can 
Le  used  to  'jreatly  enhance  the  efficiency  of  the  picposcl 
"data  filter"  whan  larcje  source  files  arc  involved.  A 
diaorar  of  a  sinple  lata  extraction  iosijn  wnich  can  be  use 
in  conjarction  vith  a  data  dictionary  "filter"  is  shewn  in 
Fijure  3.3. 

Tiiroii  j  i.out  the  reaaiolei  of  this  tiiesis,  th‘  torn 
"data  filter"  will  refer  to  the  active  data  dictionary 
valilaticn  systeir  beir'j  proposed. 

B.  CCNFIGUEATION 

^  Gene  rat  ion 

The  key  to  ccr.structin-j  th  ?  lata  filt-'C  is 
incorporating  into  a  data  licticn.ary  t.in  Capaiility  to 
■jenerate  t  ht  rr.etaiati  ne-.vlel  hy  a  syster';'  edit  ml 
validation  softw.ire.  Tie-  netifati  jereratio.^  as  trijjtr.-: 
hy  the  edit  ml  valilation  sjhtvare  timoujh  t:.e  issuance  of 
cornards  .ini  i.'-licalle  parameters.  Tnc  drata  filter  cust  '.j 
desiejned  so  that  the  edit  an  1  valilation,  witi,  its  i.;aniator 
call  “ror  ruGtn.data  tjeneration,  is  au  t  onat  ical  ly  ,'i  ct  iv  1 1  ^d 
dariny  all  iita  input  op-matians.  Tlu*  resultinj  netalita 
peneraticn  pro  1  ices  data  descriptions  ba.aed  upon  the 
characteristics  stored  in  thr.  data  dictionary  no  tad  1 1  i  rase. 
Tiiose  data  loser  iptiens  are  t  ra  nsf  or  ae  1  into  specific  edit 
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Figure  3,3  Data  Extraction  Design 

an!  validation  rilos  (T?r)  for  use  Lv  the  clit  a  r  .1 
validaticn  ^icograns.  [Ref.  12:  f.  116] 

2.  iJih  und  Validation  Proccins 

Tdit  and  validation  "rcgra-Tis  aie  eapircto  rro':  t 
ao 3 1  ica t ion  :rograins  whicl  onter  the  lata  ir.ta  the  sy/str 
They  cannot  'oe  exocatei  without  lata  lictionary  "etaiita 
the  ferr  of  T'!?.)  thrcjal'.  which  they  will  fii*-or  all  ir.co 
data.  "hose  rirojraas  art'  usually  jent'C-al  ^-'aryo'e  in  nat 
The  tailorir.j  of  the  yrojraos  tc  syocific  tyyes  cf  diita 
accoi’ii;  1  ishe  1  through  the  TVS  previled  h y  the  active  data 
dictionary.  For  exauple,  an  DilP  iata  entry  oreration  wi 
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result  in  different  EVE  teiny  passed  to  an  edit  and 
valilaticn  proyram  than  will  a  EdAD  data  entry  data 

may  be  cotiuosod  of  totally  dissimilar  data  objects  than 
data,  and  may  also  involve  very  different  validation 
criteria) .  Various  edit  and  validation  programs  can  he 
incorporated  into  the  data  filter  to  accoaaodate  distinct 
categories  of  data  entry  operations,  e.g.,  updates, 
deletions,  creation  of  new  files,  etc. 


3.  General  Desijr 

Figure  3,4  depicts  a  jer.eralized  data  filter  design. 
The  iata  dictionary  generates  metadata  based  upon  cermards 
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Figure  3.4  General 


Cata  Filter  Design 


ir.creasts  vlatd  validation  efficiency  by  relucin.j  the  aaicuat 
of  lata  to  he  "filtered.”  In  the  DCdPLANG*  case,  lut  to  the 
enorir.ity  of  the  y.'lF  anu  soxe  otter  source  cati  files,  ti.e 
time  saved  becomes  juite  siynificant. 

C.  ADVANTAGES 

Almost  all  data  editing  ai.  1  validation  systems  provide 
the  user  a  canabilty  to  validate  and  edit  data,  and  to 
correct  and  report  erroneous  lata.  There  are,  however, 
added  horiefits  to  be  gained  by  using  tbe  active  lata 
dictionary  approach  which  forms  the  basis  of  the  data  filter 
configuration  described  above. 

First,  since  the  active  data  dictionary  becomes  the 
sole  source  of  metadata  for  all  edit  and  validation 
processes,  redundant  retadata  is  eliminated  anl  metadata 
consistency  is  promoted.  In  essence,  a  much  greater  degree 
of  contrcl  over  metadata  is  realized,  and,  as  a  result, 
regulated,  consistent  validation  of  lata  is  achieved. 

Second,  the  data  dictionary  afforls  the  user  a  very 
flexible  and  easily  adjustable  validation  mechanisir.. 

Chanjes  in  lata  and  revisions  tc  valilatiai.  criteria  dc  net 
require  rodification  cf  application  programs  or  elit  and 
validaticn  .rojrams.  Instead,  changes  are  easily 
accommodate  d  by  simple  a 'i  justuie  rts  to  me  1 1  da  ta/"  Vb . 

Thirl,  shouli  the  inforaaticn  system  involve  i  be 
filo-lased  (as  is  the  case  with  DCbPLAHS)  ,  the  l  ita 
dicticnary  approach  is  an  invaluable  "bridge"  for  a  futui'': 
transition  to  a  database  system,  Ease  of  transition,  is 
promoted  ly  already  having  in  existence  an  or.^ar.  i  no  d , 
centralized  store  of  the  enterprise’s  metadata. 

One  ether  ;;onefit  of  the  pteposed  data  filter  system 
stems  ftex  the  separation  of  -^he  data  extraction  program 


from  the  actual  elit  and  validation  activities.  Not  only 
overall  validation  syeed  incri.'dsed,  but  also  the  user  no/ 
has  the  option,  in  exigent  circ ujist arses,  to  torejc 
validation  entirely  if  time  constraints  demand  such  action 
An  interdependent  extraction/validation  process  would  net 
allow  this  alternative. 


23 


IV.  PLANNING  AND  GENERAL  DESIGN 


A.  KEY  DEVELOPMENT  PHASES 

A  software  product’s  ability  to  do  what  it  is  supposed 
to  do  efficiently  is  largely  '.jcverned  by  the  quality  of  the 
detailed  design  and  ceding  that  creates  it.  Tn  turn, 
successful  detailed  design  and  coding  are  directly  tied  tc 
the  quality  of  initial  planning  and  design  activities. 

Thus,  the  planning  and  preliminary  design  steps,  taken  by 
users  to  develop  a  local  data  filter  are  crucial,  and  rust 
te  coicprehensively  and  carefully  accomplished. 

Planning  and  initial  design  of  a  data  filter  is  a  three 
phased  process.  Phase  one  describes  the  system's 
envirenasent  and  general  characteristics..  Phase  two  develops 
data  definitions  and  validation  criteria.  Phase  three 
produces  an  initial  logical  design  of  the  system.  A 
description  of  each  of  these  phases  is  presented  below, 
along  with  a  "checklist"  of  relevant  •questions  which  serves 
as  a  guide  'or  proceeding  through  the  phase. 

The  checklists  fern  a  f  rau^e  work  within  which 
users/d Hvelopers  can  rethodic-u  1  ly  develop  the  data  filter. 
The  franework  assists  them  in: 

1.  Obtaining  a  clear,  comp  relieasi  v-a  picture  of  the 
environaent  in  which  the  data  filter  will  function. 

2.  Identifying  and  defining  the  data  to  he  validated, 
and  let er .min ing  the  nature  and  scope  of  validation 
req  uired. 

3.  Ccnstructing  well-defined,  functionally  structured 
validation  and  EVE  modules. 
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B 


PHASE  ONE 


SYSTEH  ENYIROMHEHT/GENERAL  CHARACTEPISTICS 


1 .  Ce script  ion 

This  phase  identifies  all  harivaL'e  and  Jirrware 
being  used  (or  projected  for  use)  in  the  overall  inf oraat ion 
systeai,  and  describes  its  envircniont  (e.g.,  distributed  vs. 
centralized  system,  file  system  vs.  database  system,  etc.). 
It  notes  validation  capabilities  already  built  into  the 
system,  and  also  identifies  conaercial  validation 
capabilities  which  are  compatible  with  existing  hardware  and 
firmware. 

Phase  one  alsc  uncovers  the  general  nature  of  the 
input  data  to  be  validated.  Tt  identifies  the  broad 
categories  of  input  data,  examines  data  stability  and 
consistency,  and  looks  at  who  exercises  control  over  the 
entry  of  data  into  the  systeji.  This  phase  outlines  data 
entry  methods  and  notes  the  various  processing  stages  at 
which  data  validation  may  occur  (pre-input,  during  input, 
etc.).  An  overview  of  system  output  is  also  formulated. 

The  level  of  accuracy  roguired  for  the  output  is 
established,  and  the  degree  to  which  output  vali!ity  is 
dependent  upon  valid  input  is  determined. 

2.  Checklist 

Answers  to  the  following  giostioi.s  will  provide  a 
clear  picture  of  the  cverili  system,  including  inputs  and 
outputs; 

a)  What  major  hardware  components  comprise  the  systeir? 

b)  what  operating  system  is  tsed? 

c)  Vihat  validation  capabilities  ar  already  built  into 
the  system  ha r dwa re/f i rm ware? 

1)  Are  there  currently  any  plans  to  chingo/expan  i  major: 
system  hardware? 


e)  Are  aiiy  sy .-steir-compiitible  3ata  validation  proiucts 
currently  available  (either  in-house  or  cojimercially)  ? 

f)  Xhat  syster.-coEyatiblG  data  iictioaary  software  is 
currently  available  (either  in-house  or  commercially)? 

])  Are  we  dealing  with  a  file-based  or  database  system? 

a)  tohat  portions  of  the  information  system  are 
distributed? 

i)  Hew  stable  are  system  inputs?  (i-e..  Are  different 
data  eloments,  records  anc  files  aided  or  deleted  or  a 
fr€(]uent  basis?) 

j)  Are  data  definitions  and  parameters  chan^'ed 
frequently? 

k)  Are  we  dealing  with  a  stable  number  of  data  elements 

which  will  retain  stable  attributes? 

l)  Is  input  processed  in  a  batch  nodu,  on-line,  or  both? 

31)  Is  any  pre-input  validation  conducted?  Describe! 

n)  Is  any  output  validation  conductel?  Describe! 

0)  T7hat  are  the  sources  of  irput  data?  Identify  all 
input  files  and  the  applications  for  which  they 
provile  data. 

p)  /ihat  degree  of  control  over  the  entry  and  update  of 
incut  data  is  exercised  by  system  users'’ 

p)  "rem  wliat  locations,  and  ty  whom,  can  lata  be  addcl, 
cLangod  or  dele tea. 

r)  Hhat  sources  beyond  the  user's  control  prcvj.de  input 
data?  Identify  the  data  provided  by  each  of  th^se 
outside  sources. 

s)  ilow  often  is  data  entered?  LJpdated? 

t)  How  is  the  processed  data  Lein;  used?  (A  general 
description,  e.g.,  report  generation,  modeling,  etc.) 

u)  For  each  application,  report,  etc.,  how  critical  is 
validity’’  (i.e.,  Fhat  are  the  con.se  juon.ces  of 
inaccurate  outp^uts?) 
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PHASE  TWO 


DATA  BSFINITION/VALIDATION  CHITEHIA 


1 .  Description 

This  phase  ilentifies  a  rJ  defines  the  s/sten's  data 
entities.  Tor  the  purpose  of  the  data  filter,  data  entitle 
include  all  data  elecents  entered  into  the  systein  and  the 
records  and  files  which  contain  them.  The  applications 
which  use/process  these  entities  are  also  established. 

Pliase  two  alsc  sets  forth  all  validation  chacks 
required.  Data  element  characteristics  such  as  lescription 
ranqe,  type,  size,  sequence,  etc.  are  recordel,  and  ail 
entity  relationships  are  carefully  delineatel.  The 
information  developed  daring  this  phase  forms  t;-.o  data 
dictionary  metadatabase,  and  is  used  to  construct  the 
system's  EVF  and  validation  program  modules. 


^  *  Checklist 

Answers  to  the  questions  listed  helow  will  enable 
the  user/developer  to  identify,  describe,  and  determine  the 
interre lationships  of  all  systec  entities.  Ho  will  alsc  he 
able  to  establish  validation  criteria  for  each  entity  and 
cross-ref erence  them  to  the  applications  which  require  ti.at 
such  validation  occur. 

a)  What  data  elements  does  the  system  contain? 

b)  What  record  (s)  contain  these  data  elements? 

c)  What  tile  (3)  certain  those  recocis? 

!)  Fcr  each  application  (model): 

-  Which  files  feed  it  data?  Wliicn  records? 

-  Which  data  elements  dees  it  uso/process? 

-  Which  data  olcmonts  must  be  validated  (i.c.,  dees 
the  validity  cf  the  ap' p  lica  tion '  s  output  depen  1  or. 
this  iniJUt  iata  elenert  bcin  j  valid)? 

-  Is  a  specific  sequence  cf  lata  entry  recuirevl? 


-  What  pre-entry  updates/transactions  aust  occur,  if 
any? 

G)  for  each  data  element: 

-  What  is  its  name?  Any  Synonyns  or  aliases? 

-  What  is  its  Short  NaLie/Erograaminy  Naae? 

-  What  is  its  TE^? 

-  What  is  its  character  t  ipo  (alpha,  imaeric,  etc,)? 

-  What  minimum  and  maxiaua  number  of  characters  are 
allovfed?d 

-  Vhat  numeric  value  range  applies? 

-  What  character  pattern  is  used  (e.  j.  ,  CCC-NN»CC)  ? 

-  Is  there  a  mi ri sua/maxi aum  range  of  allovatle  change 
from  one  update  to  the  rext? 

-  What  cause  and  effect  relationships  exist  with  ether 
data  elements?  In  the  same  record/file,  in  ether 
Locoris/f i los?  (o.g..  If  "A"  is  chanjed,  then  ”3" 
aust  be  chanced)  . 

-  Is  a  particular  update  sequence  reguired? 

-  Co  date  fields  have  any  earliest  or  latest  date 
limits? 

-  Do  date  fields  repiire  a  special  format  (e.g. 

yy:i:'dd)  ? 

-  ■/^aa^  direct  relation  sl.i ps  exist  with  other  data 
items?  (G,g.,  value  of  "A”  must  always  Le  twice 
that  of  " 3 " )  . 

-  Is  the  idata  elemont  a  cede  or  a  value  ti’.at  be 
ch.Ci'hed  against  i  table  or  listinj  of  valid  codes  os 
va  1  ;jes  ? 


PHASE  THESE 


INITIAL  LOGICAL  DESIGN 


1 .  Dcscrii'-'tion 

Fhasf*  three  i-ioduces  a  irodol  of  the  loyicai 
structure  of  the  data  filter  systoifl  which  later  will  ire 
"built"  (durir.j  coding  and  testing).  Since  it  forirs  the 
basis  for  all  further  design  steps  and  rof  ine:nen  ts,  this 
preliiinary  logical  design  is  the  key  stop  in  the  lati 
filter  design  process.  The  data  filter  structure  developed 
during  this  phase  is  based  upon  the  general  filter  desi jn 
citod  in  chapter  three  and  the  system  onvironaient  and 
data/valitdation  inf orca ti on  gathered  during  phases  one  and 
two. 

Phase  three  gives  the  user  a  description  of  the  data 
filter  systen  goal  and  objectives,  and  presents  the  lajcr 
systeir  functions.  These  najor  functions  are  then  decomposed 
into  sub-functions  until  a  series  of  single,  independent 
modules  have  been  identified.  This  overall  systeci 
architecture  is  depicted  in  a  hierarchical  structure  diajraa 
(oee  Figure  4.1)  accompanied  by  narrative  descriptions  cf 
the  Ecdules, 

2 .  Checklist 

Answers  to  the  following  guestions  will  enable  the 
user/developer  to  produce  the  i  r.f  orraation  described  ahov.-*: 

a)  What  is  the  goal  of  the  system?  (State  the  jencral 
long-term  desired  effect). 

b)  What  are  the  system's  key  objectives?  (Sr.  unerate  the 
critical  milestones  to  be  accompiisued  to  satisfy  the 
statel  system  goal). 

c)  What  are  the  system's  major  functions?  (List  the 

general  proce.ssinj  activities  reguired  to  meet  system 
objectives).  For  example,  a  bank's  checking  account 
system  m iv  have  four  major  system  functions:  (1) 


Data  Filter 
System 
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ii.c.  I 


Figure  4.1  Structure  Diagram 

lerforming  account  a  1  mini  rtra  tior.  (open  acets.,  close 
acets,  etc.)  (2)  processing  leposits,  (I)  pcocessin 
withdrawals,  (4)  nair.ta i r.ir. j  an  accour.t  transaction 
d  atahase. 

What  oolules  (sub-f anctiens)  comprise  oich  or  the 
system’s  major  functions?  (Limit  to  no  more  than  3-5 
ii;odul‘-3  per  function,  and  rereat  tl-.e  process  level  tv 


level  until  nc  further  rccule  Jeconc  os  it  io  r.  is 
necessary,  i.e.,  sinple,  indepei.Jart  moiulcs  have  tc-e 
created)  . 

e)  What  loes  v5dch  s/stfir  rodulo  :lo?  (Give  a  precise, 

concise  .Userid ‘•io:.  of  ap  jrox  iaateiy  two  sentenctis). 

3 .  12 1 lo w He s i jn 

Cnee  the  above  phases  have  been  coipieted  and 
carefully  iocuaer.tel,  the  data  filter  structure  has  been 
tailore'l  to  the  user's  specific  envirojinent  anl  validation 
needs.  fubsci'ient  development  invoivinj  detailed  desi-jr 
(tiatd  'lows,  data  stores,  interfaces,  etc.),  soling, 
testing,  .}tc.  can  follow  using  cae  of  -a  number  of  dpplica.^1 
met  i.odolo  jics  whicii  currently  exist. 


a 
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V.  THE  DCSPLANS  "DAIA  FILTEH"  SYSTEM 

This  chipt-*r  specifically  ailressas  the  TSPLAIiS'  "fata 
filter"  .sy3te;i.  It  proviles  a  statement  ai  the  systein's 
overall  joal  anv'i  its  key  objectives.  It  also  exran'is  th" 
general  lata  filter  design  provilo'.!  in  chapter  three  into  a 
more  detailei  hierarchical  design  structure  tailored  to  the 
DCSPLANS  situation. 

A.  DCSPLANS  SYSTEM  GCAL  AND  OBJECTIVES 

A  nunber  of  DCSPIAN'S'  uni]'J€  operational  clia  rac  ter  i  st  ic 
must  to  consi  lored  when  formulating  the  system's  goal  and 
its  key  objectives.  Iheso  critical  aspects  are  uncovered 
during  Phases  I  .ml  II  of  tiie  yteliainary  leveloyaent 
activity  (prose;;ted  in  t..e  previous  ci.aptcr)  ,  and  ate  used 
to  create  the  Phase  III  deliverables  iliustratej  in  this 
cl.apter  (System  loal/Cb joct i ves  and  Structure  Pi  igrim  with 
d.irr  a  ti  VGS)  .  A  samtle  o'  the  TCSpLAPS  ch  a  ract  or  is  t  ics 
having  the  greatest  isnact  on  the  general  system  lesign  are 
preSfente  I  below. 

'!'ho  most  important  fact  is  t!;at  DCaPLAKS  personr.ei  i-.ave 
little  fait;:  in  the  accuracy  of  inpu*-  la  t  i  t; -^y  are 
recelvin j  from  a  variety  of  vw r i  large  s a irco  files  prepare 
and  maintdii.e!  by  elements  outsiie  tlieir  span  of  ccntrol. 

At  ti.e  present  time,  ICSPLAhh  Ices  ;-.oc  jossess  ti.e 
capability  to  validate  this  piestionabl  ;  ing  ut  iata.  They 
are,  however,  ■jx'-rercly  worried  arout  ti.e  alv.'rse  impact  of 
such.  ir.jUt  'ata  on  the  validity  of  model  :)utpits. 

Input  source  files  pr  jvido  crucial  .iata  t)  PCSPI.  A  hC  ' 
force  align.iient  nolels.  Zach  ct  the  t;il.:;i  iei,.’s  a  varying 


number  of  ooflels,  and  supplifaa  a  uni^ao  set  of  lata  elements 
deyendinj  or  the  i-articalar  model  involved.  ler.erally,  tl.e 
data  elements  contained  in  the  source  files  and  tiie  data 
elements  required  by  the  nociols  remain  the  same,  creatir.j 
relatively  .jood  systes  stabiity  iu  this  re  jar  1.  There  are, 
however,  occasional  chanjos  made  in  tae  data  elements 
provided  or  required.  A  DCSPLAN3  validation  tool  must 
provide  the  flexibity  to  incorporate  such  changes  easily. 

In  many  cases,  acdels  using  the  same  data  elements  frem 
the  same  source  file  require  different  1-egrees  of  validation 
(c.q.,  the  validity  of  input  lata  element  "A"  from  the 
Enlisted  Paster  Tile  may  be  crucial  to  ti.e  vilidity  of 
Personnel  Pealiness  Indicator  dcaoi  output,  but 
inconsequential  to  the  validity  of  output  prolucei  i:\  the 
Personnel  Policy  Projection  dodei  (P3P)  )  .  This,  i  PC3?L.'.::3 
vaiilaticn  tool  must  he  able  to  daf eerentiato  between  the 
validaticn  required  for  dnlistec  .'.aster  File  data  when  used 
Ly  the  Perscnael  Keadiness  ladicator  d.odel  as  opposed  to  the 
Pdd,  and  it  must  apply  edit  an  .1  validation  rules 
accord  ingly . 

Generally,  PCSPLANS*  models  are  run  on  a  st.indard 
schedule  which  coincides  with  reyuirel  n rief in js/repor ts  an: 
wl-.ich  also  facilit<ates  use  of  ore  .modeL's  output  as  input 
for  another  moiel.  There  are,  however,  occasions  when  u 
r, odd's  output  is  required  on  very  siiorr.  .notice.  In  ti-.cse 
circumstanc''!s,  the  tine  normally  devoted  to  lata  vililition 
cay  nor  be  available,  and  the  PC5PLA.’:3'  models  would  h.ive  to 
be  run  in  t!.«  .juickest  possible  ti.me  without  re-iard  to  iita 
integrity.  Thiie  such  .i  proc.;dur<:  seems  unwise,  it  riy 
occur,  and  tp  >  PC.TPLANo  vali  iation  toil  must  provide  for 
suci;  a  con  t  in  j  ency  ly  allowinj  itself  to  be  circu.m  ven  to  1  if 
re.qUired..  In  this  re.jari,  t  o  Ctl  o  P I#  A  It  .■>  u  a  t  a  filter  c .i  n  no  t 
ho  a  mandatory  part  of  a.uy  ir.*-ccral  data  extraction  or 
modelinj  process. 


The  majority  of  ECS?LAMS  mocelir.  g  activities  will  be 
done  in  a  batch  mode.  ”'he  extraction  of  yertincnt  data  fro 
large  input  files  is  also  a  batch  process  (o-  ,  the 

”UT?.ACS'*  pro.jrar;  developed  and  used  by  DC-dPLAdS  to  extract 
pjortinent  data  from  the  Enlisted  idaster  File).  However, 
capabilities  to  manipulate  data  .dictionary  metadata  on-line 
and  to  juery  the  metauatabase  on-line  are  crucial  to 
effective,  user- 'r iendly  operation  of  the  data  filter 
system.  All  other  data  filter  processes  (e.g.,  FVE 
formulation)  will  be  done  in  ha tch  moio  to  insure  run-time 
Gf  f  icier.cy . 

Based  upon  an  examination  of  the  overall  DCSPLA'IS 
situation,  and  keying  on  the  points  just  mentioned,  the  jea 
of  the  DC3P1AN2  data  filter  s/stea  is  to  validate  all 
c<ttrnally  ^•rovided  input  data  used  by  DC3PLAHS'  force 
alijnoent  models  in  consonance  with  established  DCSPIAbS 
quality  control  standards, 

Key  objectives  of  the  DCSPLAHS  data  filter  system  are: 

1.  It  must  be  compatible  with  the  existing  DCSPLANS 
computer  system  configuration. 

2.  It  must  allow  flexible  and  easy  additions  and  update 
to  the  metadatabase. 

3.  Its  interface  with  the  data  extraction  and  rcodelino 
processes  must  be  optional  (at  the  discretion  of  tne 
Ciiitf,  DCoPLANS;  otherwise  it  will  1-e  in  automatic, 
mandatory  interface)  . 

i .  It  must  t^ovide  for  the  automatic  adjustment  of  riit 
and  validation  rules  to  suit  the  p.articular  source 
file  and  model  being  processed. 

5.  Tt  must  provide  an  interactive  un-iine  query  facilit 
for  accessing  the  me tada  taha so. 

6.  It  must  providt  an  error/status  report  jeneraticr. 
fac ilit  y . 

7.  Tt  must  be  a  user-friend  iy  system. 


Systea  ■ieveloj.ient  and  i  nolea-intation  costs  must  b« 
consistent  with  the  "local"  nit. ire  of  the  system.  A 
conservative  apj'roach  is  Jesire.l. 

B.  "DATA  FILTER"  STHOCTOHB 

This  section  uses  a  struct  ure  diajram  (in  mo  iifie  i 
format)  to  set  forth  the  proposed  structure  of  the  DCSPLAl'Jo 
"data  filter"  system  software.  The  structure  is  derived 
from  a  functional  decc mposition  process  in  which  major 
system  furctions  are  split  successively  into  sets  of 
sub- f unctions.  The  proposed  DCS?LA'J5^  system  will  be 
decomposed  to  three  levels.  This  decomposition  danor.strate 
the  hierarchical  control  structure  and  relationships  of 
modules  which  comprise  the  overall  "data  filter"  projrar. 

It  does  not  represent  any  particular  processinj  sequence  or 
order  of  decision- making.  [Ref.  13:p.  149] 

"he  structure  diagram  is  ncimally  presented  in  the 
graphical  format  shown  in  Figure  4.1.  However,  due  to  the 
crowding  effect  that  will  occur  from  a  three-level 
decomposition,  the  major  system  functions  (level  1)  and 
subordinate  modules  (levels  2  a  nd  ?,)  are  represented  here  i 
paragraph/sub-paragraph  format  (See  Figure  5.1).  dodulcs 
depicted  in  this  manner  are  easily  transferred  to  a  .jraph  ic 
representation  of  the  overall  system,  if  required. 

^ •  Struct  are  Diagram 

The  proposed  data  filter  system  contains  tive  major 
functions  (Control  Data  Filter  System,  yiaintair. 

"etad  at  abase.  Produce  EVD,  Validate  Input  Dati,  dencrate 
heports)  .  T];e  system's  hierarchical  structure  is 
illustrated  below,  fcllowei  by  descriptions  of  each  major 
function,  sul - f unction,  and  lower  level  nodule. 


DCS^LANS  rata  Fiit'ir 
rST  MAJOn  7•aNCTIO:^  (level  1) 

First  S ut-f ur.ct ior.  cf  1.0  {Level  2) 

1.1  First  rolule  of  1.1  (Level  3) 

1.2  Second  Module  of  1.1  (Level  3) 

1.3  Third  Module  of  1.1  (Level  3) 

Second  Suh-f unction  of  1.0  (Level  2) 

2.1  First  .Module  of  1.2  (Level  3) 
COND  MAJOE  FUNCTION  (Level  1) 

First  Sub-function  cf  2.0  (Level  2) 

1.1  First  Module  of  2.1  (Level  3) 

1.2  Second  Module  of  2.1  (Level  3) 

Second  Sut-function  of  2.0  (Level  2) 
Third  Sub-function  cf  2.0  (Level  2) 

3.1  First  Module  of  2.3  (level  3) 


Figure  5.1  Saaple  Paragraph  Format 


CONTROL  DATA  FILIFR  SYSTEM 

1.1  Verify  Transaction  Validity 

1.1.1  Read  Access  and  Traiusacticr.  Cedes 

1.1.2  Evaluate  Cod(5S 

1.1.3  Impieient  Validity  Decisior. 

1.2  Peevide  '■’on u /Screen 

1.2.1  Read  Validity  Decision 

1.2.2  Di.snlay  Aipropriato  Screen. 

1.3  Transfer  Control 

1.3.1  Real  Screen  Input 
1 .  j.  2  Detomine  Proper  Process 
1.3.1  Pass  rrojr:t.u  Control 
MAINTAIN  ME^ADATAEASE 
2.  1  Ccntrol 


2.1.1  Provide  iletadata Lase  .'lenu 

2.1.2  Transfer  Control 

2.2  Adi  Metadata 

2.2.1  Reavl  Add  Data 

2.2.2  Check  Dnitiueness 

2.2.3  Check  Format 

2.2.4  Accept  Data 

2.3  Delete  Metadata 

2.3.1  Read  Delete  Reqaest 

2.3.2  Locate  Metadata 

2.3.3  Remove  Metadata 

2.4  Change  Metadata 

2.4.1  Read  Change  Request 

2.4.2  Locate  Metadata 

2.4.3  Update  Meta  lata 
PRODUCE  FVR 

3. 1  Ccntrol 

3.2  Retrieve  Metadata 

3.2.1  Read  Source  rilo/Model  Code 

3.2.2  Open  Metadata  File(s) 

3.2.3  Extract  Pertinent  Data  Valu 

3.3  Fccaulatc  EVF 

3.3.1  Load  Variables 

3.3.2  Set  Efcitchos 
VALIDATE  IMPUT  TATA 

4.1  Ccntrol 

4.2  Select  EVF 

4.2.1  Deteraine  Input  Record  Type 

4.2.2  Extract  Ap^-licable  EVP 

4.3  Apply  EVR 

4.3.1  Read  Input  Data 
4,j,2  Read  EVF 

4.3.3  Check  Parameters 

4.4  t’rovili.*  Processed  Input  Data 


4.4.1  Read  Error  Code 

4.4.2  Transfer  Ecroncois  Dati/Srror 

4.4.3  Transfer  Valid  Data 

4.5  Maintain  Statistics 

4.5.1  Maintain  Transaction  Count 

4.5.2  Maintain  Error  Count 

4.5.3  Sort  Error  Types 
5,0  GENERATE  REPORTS 

5.  1  Control 

5.2  Retrieve  Peper t/Hesoonse  n,ita 

5.2.1  Deteritine  F.eport/ltes ponse  Tyre 

5.2.2  Read  Spplicaolo  Data 

5.3  Porfora  Calculations 

5.4  Provide  R epo rt/Rosponse 

5.4.1  Deteritino  Foraat 

5.4.2  Eoraat  Data 

5.4.3  Transfer  to  Output  Device 

2  •  ii^ESati  ve  Desccintioiis 

The  follovinj  are  succinct  explana ti ors  of  the  k 
aspects  of  each  structure  iiajram  function,  sih- function 
and  modulo.  Each  lo’wer  level  description  serves  to 
rof ine/expan 1  the  detail  of  its  superior  level. 

-  1.0  CCMTROI.  DATA  IILTEH  EYSTEd;  This  function  cent 
access  to  the  data  filter  system  and  verifies 
transaction  validity.  It  also  provides  screens  fer 
implementing  other  major  system  functions,  and 
transfers  control  to  these  processes. 

-  1.1  VERIFY  T?  A’.IEACTICM  VALIDITY:  This  sib- f  u  nc  t  io  n 
insures  that  the  user  is  authorized  access  to  the 
system  for  the  desired  trarsaction,  and  th i t  the 
transaction  itself  is  valid  (e.j.,  an  attempt  to 
validate  the  Snlisto!  Management  File  for  use  in  th 
Cfficer  Pronoticn  Model  would  bo  rejected). 


1,1.1  22A?  ACCESS  AIID  ThAirSACT  ION  CODES 


his  icodule 


reaJs  in  the  user's  access  coio  inl  the  transaction 
codes  indicating  the  desired  -process  and  the  source 
input  f ile/node  1  (s)  involved. 

1.1.2  EVALUATE  CCDES;  This  sodulc  checks  user-su pplied 
codes  against  authorized  access  and  transaction  codes. 

1.1.3  UPLEMENT  VALIDITY  LECISIOU:  This  aodale  will 
either  reject  the  transaction  or  pass  an  indication  of 
a  valid  transaction  to  aodrle  1.2.1.  This  nodule  also 
sets  restrictions  within  authorized  processes  (e.g.,  a 
user  nay  he  allowed  to  add  metadata,  but  not  change  cr 
delete  existing  aetadata)  . 

-  1.2  PFOVIT/E  H  EN'J/ECF.SEN :  This  sub- fu  net  ion  provides 
the  user  with  the  appropriate  screen  for  continued  use 
of  the  system. 

-  1.2.1  PEAD  VALIDITY  DECISICN:  This  module  reads  the 
validity  indicator  produced  by  modulo  1.1.3. 

-  1.2,2  DISPLAY  AFF20PSIATE  SCFESN:  This  module  causes 
either  a  menu  or  screen,  as  appropriate,  to  appear  on 
the  monitor. 

-  1,3  TFANSEEP  COETTOL:  Tliis  sub-function  passes  control 
tc  an  appropriate  system  mcdule  in  response  to  user 
input. 

-  1.3.1  I-EAD  SCFEZN  INPUT:  This  mo  iule  reads  user 
responses  to  terninal  prompts. 

-  1.3.2  DZ^ERdlNE  PEOPEF  P50CES3:  This  roduio  interprets 
user  input  in  terms  of  tlie  desire  1  system  futkction 
(e.g.  ,  up  late  metadata,  gererate  report,  etc.). 

-  1.3.3  PAHJ  PSOGFAM  CCNTFGL:  This  modulo  pisses  cotittol 
to  the  appropriate  systoy  nodulo. 

-  2.  u  MATNTAI'I  MET  ADA?  A3ASZ :  This  funotion  creates  new 
met  ^database  entries,  doletf^s  me ta  ia  tab  ise  contents, 
and  makes  changes  to  the  cxistinj  mutadatabase . 


2.1  CCNTroL:  Tliis  sub-function  iisplays  the 
jretadatabdse  menu,  and  joverns  the  activition  and 
sequence  of  add,  ch.inje  and  delete  processes. 

2.1.1  PROVIDE  IlETADArASASE  MEMU:  This  module  lis^^lays 
a  menu  jiving  the  user  o^.’tions  of  addir.j,  deleting  or 
changing  metadata. 

2.1.2  TEANSFSP  CCMTROL:  This  modilo  passes  control  to 
either  modules  2.2,  2.3,  or  2.4,  depending  on  user's 
request  and  access  author i2ation. 

2.2  ADD  :'!ZTADATA:  This  suh-function  reals  metadata 
input,  checks  it  for  duplication  and  proper  entry 
format,  and  either  rejects  the  input  or  stores  it  in 
the  rnetadatabase. 

2.2.1  READ  ADD  DATA:  This  module  reads  data  which  the 
user  desires  to  enter  intc  the  aetadatalas e. 

2.2.2  CHECK  [JNIQUEMESS:  This  module  checks 
raetadatabase  to  insure  data  to  be  added  does  not 
already  reside  there. 

2.2.3  CHECK  F0E.1AT:  This  nodule  checks  data  to  be 
added  for  compliance  with  prescribed  standard  metadata 
entry  formats. 

2.2.4  ACCE?'^  FATA;  This  mcdule  evaluates  results  ci 
module  2.2.2  and  2.2.3  processing,  and  either  rejects 
data  to  ho  aided  or  stores  it  in  the  me  tada  tab  ase . 

2.3  P/ELETZ  METADATA:  This  sub-function  reads  metadata 
deletion  request,  locates  the  data  in  the  m ct aca t a h a se 
anl  removes  it. 

2.3.1  DEAD  DZLFT’^  FZ^UEDT:  This  moluie  seals  the 
user's  request  tc  delete  data. 

2.3.2  LOCATE  METADATA:  "’his  module  Locates  i.ndicated 
metadata  in  the  leta database. 

2.3.3  lEMOVE  METADATA:  This  module  removes  netalata 
from  the  me  ♦•a  da  tahase  after  a  re- verif  icati  on  of  the 
user's  desire  to  delete  the  data. 


2.4  CHA'.IG^  METADATA:  This  sub-fanc  tior.  is  a 
metadata  chanje  request,  locates  the  data  to  ho 
chan^eC,  and  updates  the  data  after  verification  that 
the  new  metadata  meets  the  prescribed  entry  format. 

2.4.1  DEAD  CHANGE  EEdtIEST:  Ibis  module  reads  the 
user's  request  tc  update  existing  metadata. 

2.4.2  LOCATE  METADATA:  This  nodule  locates  the 
metadata  to  be  changed. 

2.4.3  npDATE  METADATA:  This  module  replaces  old 
metadata  with  new  metadata. 

3.0  r-uCDOCE  ZVR:  This  function  produces  edit  and 
validation  rules  for  use  by  sub-function  4.3.  Metaiat 
values  are  extracted  from  the  metadatabase  and  are 
transformed  intc  bounded  conditional  statements  throng 
which  input  data  will  bo  run. 

3.1  CCNTDOL:  Tliis  sub-function  governs  the  activation 
and  sequence  of  processes  involved  with  tlie  production 
of  edit  and  valiiation  rules, 

3.2  ACCEPT  PSOCESSIHG  CODES:  This  sub-function  reads 
the  source  file  and  model  codes  eiitered  by  the  user, 
opens  Appropriate  metadata  files,  extracts  applicaable 
metadata  values,  and  stores  them  in  a  "variables"  file 

3.2.1  DEAD  SO'JECE  FILE/MCEEL  CODES:  This  module  reads 
the  source  file  and  model  identification  codes  entered 
earlier  by  the  user. 

3.2.2  OPEN  METADATA  FILE (3):  This  module  identifies 
and  opens  all  metadata  files  containing  lata  relating 
to  source  file  and  models  noted  by  Diodulc  3.2.1. 

3.2.3  EXTRACT  FEPTIVENT  DATA  VALUES:  This  modulA 
extracts  pertinent  metadata  values  from  opjened 
metadatabase  files  an  1  stores  the  data  in  i  "variables 
file. 

3.3  ECFM'JLATE  EVF:  This  s ul- f unct ion  reads  the 
metadata  values  stored  in  the  "variables"  file  intc  a 


file  oi  prf'-estat lishe  1  corJitionai  statcnents,  thereby 
settinj  switches  either  on  or  off  anl  setting  upper  and 
lover  Lounduries  or  acceptable  input  lata  values, 
(Settinj  anl  boundaries  will  thereiore  vary  accord  in j 
to  the  combinaticn  of  source  file  and  model  codes 
presented  by  the  user.) 

3.3.1  LOAD  VARIAEIZ3:  This  module  roads  the 
"variables"  file  into  a  fide  of  pre-set  conditional 
statements. 

3.3.2  SET  SWITCHES:  This  oodule,  depending  on  variable 
values,  sets  swithches  either  on  or  off  and  establishes 
upper  and  lower  boundaries,  as  required. 

4.0  VALIDATE  INPUT  DATA:  This  function  actually 
performs  the  validation  by  selecting  specific  EVP, 
applying  these  IVS  to  the  input  data,  and  providing  the 
processed  input  data  to  cither  a  "validated  data"  file 
or  an  "error"  file.  This  function  also  caintair.s 
statistics  on  the  number  of  data  items  processed  ani 
the  number  and  category  of  errors  found. 

4.1  CONTROL:  This  sub-function  governs  the  activation 
and  sGcueiiCe  of  processes  involved  in  the  actual 
validation  of  input  lata. 

4.2  SELECT  EVR:  This  s ub- funct ion  identifies  the  t\\e 
cf  record (s)  being  validated  from  the  source  file,  and 
activates  only  those  EVR  whici;  apply.  (This 

s ub- f unction  precludes  the  valilation  program  from 
unnecessarily  running  an  input  recor 1  past  all  source 
file  EVR,  thereby  enhancing  run-time  efficiency  of  the 
overall  process.) 

4.2.1  nrTER’'TN2  INPUT  EECCFD  TYr’ES;  This  nodule 
identifies  the  subset  of  records  t.iat  are  being 
validated  from  the  source  input  file. 

4.2.2  EXTRACT  APPLICABLE  EVP:  "his  nodule  extracts 
only  those  EVR  applicable  to  the  record  types  being 
validated . 


4.3  APPLY  EVE;  Thi-s  sub-function  reuds  ti^a  input  data 
and  its  associated  EVE,  and  compares  the.?,  to  verify 
cc ipliance, 

4.3.1  EEAD  INPUT  DATA:  This  module  s e>juant  ialiy  rca  3s 
input  data  to  be  validated. 

4.3.2  PEAD  EVE;  This  module  reads  EVP  from  module 

4.2.2. 

4.3.3  CHECK  PAEANSTEuS;  This  nodule  compares  input 
data  to  EVE  parameters,  assigning  an  appropriate  error 
code  (including  '’no  error'*). 

4.4  PROVIDE  PROCESSED  IN'^UT  DATA;  This  sub  -  f  uiicti  ca 
reads  tho  processed  data  and  it.s  error  c^le,  ar.3 
transfers  the  data  accordingly. 

4.4.1  EE.AD  KRP.OF  CODE;  This  modulo  reads  the  iat  a  and 
associated  error  code  from  module  4.3.3. 

4.4.2  TRANSFER  ERRONEOUS  DATA/ERROI  CCDF;  This  module 
transfers  erroneous  data  with  its  as.sociated  error  cede 
tc  an  "error"  file. 

4.4.3  TRANSFER  VALID  DATA;  This  module  transfers  all 
valid  input  data  to  a  "validated  lata"  file. 

4..S  MAINTAIN  STATISTICS;  This  sub- f  u  net  ior.  mairtairr  i 
running  count  cf  the  nuabet  of  tr.a nsacti  jns  processed 
and  the  number  ar.n  type  of  errors  found. 

4.E.1  MAINTAIN  TFAN.SACTTCK  COUNT:  This  module 
maintains  a  running  count  cf  the  number  of  trar.sac ticn s 
processed  in  a  validation  activity. 

4.5.2  MAINTAIN  ERROR  COUNT;  This  module  counts  the 
nuiler  of  errors  found  and  notes  the  error  cole 
involved. 

4.E.3  SOD'"  ERRC?.  TYPES;  This  module  sorts  a  validation 
activity's  error  count  by  tyye  of  error. 

E.  3  C-ENEr. ATE  FF.FCnTS:  This  function  accepts  reguests 
for  both  printed  reports  and  interactive  (termini) 
responses,  ieterirines  and  retrieves  the  ippropriate 


ret  cr  t/response  lata,  pcrfcrrrs  cal  a  la  tio-.iS  anl 
formatting  as  reguired,  and  issues  ti.e  requested 
refer  t/responsa  . 

5.1  CONTROL:  Tils  sub-function  governs  the  activacicn 
and  sequence  of  processes  involved  with  the  production 
of  printed  reports  and  interactive  response  to  terminal 
queries. 

5.2  FETFInVZ  E ErCET/P.ESFC ?3 SE  DATA:  This  sub-function 
determines  the  type  of  reper t/response  desirad  and 
reads  required  data  from  appropriate  files. 

5.2.1  DETETMIME  FEPOPT/EES EDMiE  TYPE:  This  module 
interprets  the  user  request  for  inforraation  in  terms  of 
repert/response  content. 

5.2.2  DEAD  APPLICADLE  DATA:  This  nodule  locates,  reads 
and  tenporarily  stores  the  data  nee  lad  for  tne 
requested  report/response. 

5.3  PSAEOEIi  CALCULATIONS;  This  sub-f unction  dotercir.es 
whetiier  calculations  are  required  to  produce  desired 
inf cr (nation,  and  if  so,  it  reads  the  appropriate  data 
and  performs  the  required  eperatioas,  pro  lacing  "new” 
rep. ert/roponse  data. 

5.4  FKOVIDE  F.EPCET:  This  suL-f unction  detormines  t!'.e 
appropriate  repert/response  fornat,  formats  the  data 
accordingly,  and  transfer  the  formatted  data  to  the 
appropriate  output  device. 

5.4.1  DETERMINE  REPO?.'^  TCFYAT:  This  EO  V.i  le  determines 
the  forrat  requited  for  the  desire  1  response}  in 
accordance  with  pro-establ  is;u;u  f  ormat  parameters. 

5.4.2  I'CF>!AT  DATA;  This  mcluie  arranges  lata  in  proper 
format. 

5.4.3  tpangFER  TC  output  DEVICE:  Tills  .module  transfers 
the  foraatted  data  to  the  approt-riato  out  out  device. 


C.  "DATA  FILTER"  I SPIEMENT ATIO K 


""wo  key  iidvantages  iiilierer.t  in  the  yroposei  local  data 
valiuaticr.  systea  concept  are  lev  deveio^aert  coats  and 
speedy  iirpleaent  itlon.  In  this  light,  initial  DCSPLAt'S 
develcfitent  efforts  irtst  focus  cii  the  creation  of  a 
prototype  systea  that  takes  caxiaua  advantage  of  existing 
resources.  Specifically,  the  DCSPLANS  prototype  must 
incorporate  the  existing  fJTSACS  prograa  whicn  extracts 
relevant  Enlisted  Master  File  (IMF)  data,  the  existing  DBASE 
II  data  dictionary  which  currently  includes  general  lodel 
and  office  wetadata  in  its  me ta ca tahase,  and  the  existing 
DCSPLANS  IBM  PC  nicr ocomputer.  The  DCSPLANS  local  data 
filter  system  therefore  will  ccrsifjt  of  an  IDM  PC  base!, 
DBASE  II  program  which  filters  I’Ac  input  data  for  use  in  two 
application  models  (two  models  must  be  used  to  tost  the 
system's  ability  to  differentiate  between  the  degrees  cf 
validation  require  i  by  separate  models  using  the  same  input 
data  source  file). 

The  following  steps  suggest  a  aetholology  for 
development  o'  the  initial  DCSPIANS  prototype  "data  filter" 
syst  p  a, 

1.  Determine  and  implement  the  proper  interface 
mechanism  for  feeiing  PTIACS  oxtractel  ~  1  data 
through  tl;e  I  DM  PC  data  filter  system. 

2.  Expan',  current  data  dictionary  capabilities  by 
creating  adlit.ional  motacatabase  modules  wi-.icii  will 
accept  an!  store  netalata  a.jout  source  file  ani  moP.el 
data  elements.  Create  an  addtional  data  licricnary 
metadatabase  module  that  will  accept  anl  store  EV" . 

3.  Dsing  the  Phase  II  checklist  from  chapter  four, 
ccirrehen  si  ve  1  y  construct  data  lefinitions  for  EM’-' 
an.d  model  data  ele-ments,  and  create  the  hVP  .'.etahita 
wiiich  sets  data  element  valilatlor.  parameters  and 


interrold  t:ion£hii:3.  Thif  oi’i^t  b3  ac  joej;  lishc-l 

vith  the  full/  coiiatant  coOt>eratior.  of  those  DCSri.  AM 
^^ersoiinel  Eost  closely  acij  uair.toJ  with  the  EMF  ar.l 
the  two  application  molels  heirij  used  for  *he 
\ rototype. 

4.  Load  the  data  definition  and  ZV?.  metadata  into  the 
data  dictionary  netadat  a  tase. 

5.  nsin:j  the  functional  modules  from  section  3  of  this 
chapter  as  a  ^’nide  (particularly  function  4.0), 
create  an  o  di  t/valida  ti  c  r  projraE  whici;  will  control 
and  irplftEont  the  overall  data  filter  process. 

The  development  met iiodolooy  presented  above  is  based 
upon  a  linited  ou-site  review  cf  DCS?L.AIid  operations.  ^ 
more  comprehensive  examination  cf  the  DCSPLAVl  env irorsnent 
(See  Phase  I  of  the  planninu  anc  initial  design  process 
described  in  chapter  four)  will  most  likely  uncover  some 
additional  requirements  and  necessary  adjustments. 

Therefore  a  lotailed  cn-site  environmental  review  is  an 
essential  rrero'j  uisite  to  any  DCSPLAh.l  data  filter 
deve  lopmcr.t/inplementatior.  effort,  especially  one  bcir.j 
underta  ken  l  y  non-DC  .SFLAVd  perscnnel. 


VI.  CONCLUSIONS  AND  R ECO.'IM  EN DA TIONS 


A.  CCNCIOSIONS 

DC3P1ANS,  ;!ILPERCEN  suffers  froir.  d  idta  coritrol  r.rotier, 
coinaor.  tc  many  snail  user  groups  in  large  data  j^rocessir; 
systems.  It  is  unable  to  verifj  the  correctness  of  input 
data  obtained  from  sources  outside  its  span  of  control.  At 
the  present  ti^ne,  DCSFLA?13  must  roly  almost  exclusively  on 
the  the  competer.ce  of  its  outside  sources  to  guarantee  t:;^- 
intayrity  of  its  input  data.  The  situation  is  causii..j 
DC5PLA'I3'  managers  a  great  deal  of  coacorn. 

Top-level  Army  dacision-ma k ers  use  output  fron  rcPPlANS 
appi icdt  ions  to  formulate  long  -  range  personnel  managemenr 
policies.  Thus,  the  adverse  imiact  of  erroneous  input  lata 
entering  CCSPLANS*  models  can  te  far-reac:iing  and  extremely 
serious.  Despite  this  fact,  DCSPLMJS'  small  size  rolativo 
to  th.G  overall  .■iTLPERCZN  inforcation  processing  system 
preclu  les  it  from  sttcnuly  inflceucinj  ti.e  adoption  of  a 
system- wide  validation  capability.  LCdPI.ANS  must  t'.-.crcfore 
devalop  and  i  mnlotiien  t  a  "local"  solution  to  its  lata 
valicaticr.  problem. 

DCTPIAN'S'  models  .ind  their  associatol  in[.  ic  source  file 
contain  lany  of  the  same  iata  items.  \ 1  litioaal  ly,  a 
variety  of  rola  tionsi.i  cs  axists  amonj  the  input  data.  T'ais 
situation  dt-aiinds  that  DCSPLAiS’  use  i  variety  or  valilatio 
techniques  to  insure  the  accuracy  of  lata  usei  hv  its 
noj-"ls.  In  allition  to  routine  format  cla.-cks,  a  sari  as  or 
rt  asoridhltjr.rss  checks  are  also  reeded  to  guar  in'-  :o  that 
input  is  both  complete  anl  consistent.  Po  ason  if  leness 
checks  are  core  coni  lex  than  the  format  cl.ecks,  ind  are,  in 


fact,  the  real  key  tc  iasaring  a  truly  integrated  validation 
process  (i.c.,  data  elenients,  records  and  files  are  not  only 
valid  by  thoGinelves,  but  also  in  relaticr.  to  other  relevant 
elements,  records  and  files).  Cf  course,  validation  of  the 
legality  and  proper  sequencing  cf  an  input  activity  itself 
must  precede  the  validity  checks  on  the  data. 

An  ideal  validaticn  tool  for  DCSPLANS  is  the  active  data 
dictionary,  ronfigured  as  a  data  filter,  the  dictionary 
provides  a  flexible,  user-f r  ien cly,  easily  expaU'iable 
validation  system  for  a  "small”  user  group.  The  data  filter 
can  be  developed  locally  using  the  expertise  currently 
available  within  DCSFLAh'S.  Such  local  de  velopui  ent  allows 
the  data  filter  system  to  be  tailored  precisely  to  TCSelAhS' 
own  validation  needs.  The  data  dictionary  approach  permits 
quick,  easy  adaptaticr.  of  the  data  filter  to  chanrges  in 
models  and  input  data  source  files  by  simply  adjusting 
dictionary  metadata.  No  extensive  validation  program 
re-vrites  will  be  required.  Also,  the  use  of  a  .metadatahase 
as  a  single  source  of  data  for  building  EVE  provides  a 
ready-made  nechaaism  for  keeping  the  Z7H  consisrent. 

Lastly,  an  active  data  dictionary  allows  DCSPLANS  to  develop 
future  data  processing  tools/ca  pabili  ties  with  relative  easc- 
and  minimal  investments  of  time  anl  money. 

Preliminary  planning  is  crucial  to  DCJPLA'IS'  successful 
develociiicnt  of  the  data  filter.  The  overall  DCSPLAKS  data 
processing  onvironme nt  must  bo  enderst  )od,  an  d  data 
dofinitioii  and  associated  validation  ro  guireme  nt  s  must  he 
comprehensively  examine-d  and  carefully  iocuientoi.  Thorough 
accomplishment  of  these  first  two  phases  of  dcvelopoier.t 
will  provide  a  solid  base  for  betn  preliminary  and  detailed 
system  design.  Preliminary  design  should  be  accomplished 
through  a  functional  decomposition  of  major  system 
functions.  These  major  functions  must  be  derived  free 
analysis  of  phase  one  and  two  results,  and  must  satisfy  the 


achievement  of  the  ojpecific  ^joals  and  key  objectives  cf  the 
DCSPLANo  system. 

E.  BECCMSENDATIONS 

An  effective  DC3FLAIJS  approach  to  its  data  validation 
probleE  must  key  on  the  concep t s/desiyus  presented  in  this 
thesis.  It  is  rococimended  that: 

1.  oesPLANS  pursue  an  efficient  "local”  solution  which 
can  be  tailored  to  its  specific  needs,  rather  than 
await  or  attempt  to  influence  the  adoption  of  an 
or<jani2dtion- ■'ide  validation  system. 

2.  the  local  solution  applied  by  DC3PLANS  he  an  active 
data  dictionary  "data  filter.” 

1.  DCSPLAh’S  begin  developciert  with  a  prototype  system 
that  will  validate  Enlisted  Master  File  (EM?)  data 
for  use  in  two  models.  This  approach  tests  the 
system’s  ability  to  differentiate  between  the  degrees 
of  validation  ceguired  by  different  oolels  using  the 
same  source  data  file,  and  also  takes  aivantaje  cf 
the  existing  CTPAC3  program  (extracts  relevant  EMF 
data).  The  prototype  should  use  an  easy- to- prog ra m, 
easy-to-use  relational  database  management  system 
with  a  siiiple  guery  language  facility  (similar  to 
CEASE  II)  . 

4.  DCSPLAMS  appoint  a  snail  project  team  to  oversee  the 
data  filter  development.  The  team  must  conduct  a 
thorougii  on-site  review  cf  DCSPLANS  environmental 
characteristics  and  data  definition/validation 
criteria  (Chapter  Four)  prior  to  revisions  of  the 
general  design  (Chapter  Five)  aiil  subsequent  coiiiij. 
k’hile  detailed  design  anc  coding  can  he  conducteci 
off-site  (perhaps  as  a  thesis  project),  the  review  or 


environmental  characteristics,  data  definition,  and 
validation  criteria,  must  be  accomp  lislie i  at  DCSPLANS 
hy  personnel  familiar  wit!i  DCS?LA?!S  operations.  The 
checklists  in  chapter  four  provide  comprehensive 
■juidelines  for  such  an  examination. 
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